Fix JVM <clinit> deadlock by removing static final accessor fields#48689
Open
jeet1995 wants to merge 1 commit intoAzure:mainfrom
Open
Fix JVM <clinit> deadlock by removing static final accessor fields#48689jeet1995 wants to merge 1 commit intoAzure:mainfrom
jeet1995 wants to merge 1 commit intoAzure:mainfrom
Conversation
Member
Author
|
Closing — bridge classes don't allow adding new methods. Proceeding with #48667 (Class.forName with explicit classloader). |
e57066d to
66afd43
Compare
Member
Author
|
/azp run java - cosmos - ci |
Member
Author
|
/azp run java - cosmos - tests |
Member
Author
|
/azp run java - cosmos - kafka |
Member
Author
|
/azp run java - cosmos - spark |
Member
Author
|
/azp run java - spring - ci |
|
Azure Pipelines successfully started running 1 pipeline(s). |
4 similar comments
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Member
Author
|
/azp run java - cosmos - ci |
Member
Author
|
/azp run java - cosmos - tests |
Member
Author
|
/azp run java - cosmos - spark |
Member
Author
|
/azp run java - spring - ci |
|
Azure Pipelines successfully started running 1 pipeline(s). |
4 similar comments
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
c8c4732 to
5df3405
Compare
Member
Author
|
/azp run java - cosmos - ci |
Member
Author
|
/azp run java - cosmos - tests |
Member
Author
|
/azp run java - cosmos - kafka |
Member
Author
|
/azp run java - cosmos - spark |
Member
Author
|
/azp run java - spring - ci |
|
Azure Pipelines successfully started running 1 pipeline(s). |
4 similar comments
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Replace all static final accessor fields and inline ImplementationBridgeHelpers calls with uniform private static getter methods across 78 files. This eliminates <clinit>-time class loading that caused permanent deadlocks under concurrent class initialization (JLS 12.4.2). Also fix CosmosItemSerializer.DEFAULT_SERIALIZER circular <clinit> dependency — create the instance directly instead of cross-referencing DefaultCosmosItemSerializer.DEFAULT_SERIALIZER which is null during recursive same-thread <clinit>. Fixes: Azure#48622, Azure#48585 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
5df3405 to
2c38b75
Compare
Member
Author
|
/azp run java - cosmos - ci |
Member
Author
|
/azp run java - cosmos - tests |
Member
Author
|
/azp run java - cosmos - kafka |
Member
Author
|
/azp run java - cosmos - spark |
Member
Author
|
/azp run java - spring - ci |
|
Azure Pipelines successfully started running 1 pipeline(s). |
4 similar comments
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a JVM-level
<clinit>deadlock that occurs when multiple threads concurrently trigger Cosmos SDK class loading for the first time. This is a permanent, unrecoverable deadlock that hangs all affected threads indefinitely.Also fixes a latent
CosmosItemSerializer.DEFAULT_SERIALIZERnull bug caused by circular<clinit>dependencies betweenCosmosItemSerializerandDefaultCosmosItemSerializer.Fixes: #48622, #48585
Root Cause
Deadlock
Consuming classes cached accessors in
private static finalfields:During
<clinit>, the getter finds the accessor null and callsinitializeAllAccessors(), which eagerly loads 40+ classes. When two threads enter<clinit>of different classes simultaneously, the JVM's per-class initialization locks create a circular wait — permanent deadlock (JLS §12.4.2).DEFAULT_SERIALIZER null
CosmosItemSerializer.DEFAULT_SERIALIZERwas assigned fromDefaultCosmosItemSerializer.DEFAULT_SERIALIZER. WhenDefaultCosmosItemSerializer.<clinit>ran first (e.g., viaINTERNAL_DEFAULT_SERIALIZERaccess), the recursive<clinit>ofCosmosItemSerializerreadDefaultCosmosItemSerializer.DEFAULT_SERIALIZERbefore it was set (JLS §12.4.2 same-thread recursive init = no-op), resulting in null. This causedNullPointerException: serializer is nullinUtils.parse()andGatewayAddressCache.Fix
1. Uniform static getter pattern
Every accessor is now accessed via a short
private staticgetter method — no fields, no<clinit>involvement:The accessor is already cached inside
ImplementationBridgeHelpersviaAtomicReference— the getter adds one volatile read (~1ns) per call, negligible vs actual Cosmos operations.2. Break circular
<clinit>for DEFAULT_SERIALIZERScope of changes
private static final XxxAccessorandprivate final XxxAccessorfields in consuming classesprivate static XxxAccessor xxx() { return getXxxAccessor(); }ImplementationBridgeHelpers.XxxHelper.getXxxAccessor().method()inline calls converted toxxx().method()com.azure.cosmosstatic { initialize(); }blocks addedCosmosRequestContext,CosmosOperationDetails,CosmosDiagnosticsContextgetCosmosAsyncClientAccessor()→getCosmosDiagnosticsThresholdsAccessor()inCosmosDiagnosticsThresholdsHelper<clinit>fixCosmosItemSerializercreates instance directly instead of cross-referencingDefaultCosmosItemSerializer. Removed deadDefaultCosmosItemSerializer.DEFAULT_SERIALIZERfield and itsserializationInclusionModeAwareObjectMapper.Documented exceptions (not converted to static getters)
DefaultCosmosItemSerializer— instance field intentionally preserved. This class is instantiated duringCosmosItemSerializer.<clinit>; the instance field initialization triggers theinitializeAllAccessors()fallback at exactly the right time (aftersuper(), before constructor body).HttpClient.java— Java interface; Java 8 doesn't supportprivate staticinterface methods. Uses method-local variable.Utils.java—ensureItemSerializerAccessor()uses a CAS/caching pattern withAtomicReference. Preserved as-is.BridgeInternal/ModelBridgeInternal/UtilBridgeInternal— accessor registration sources, not consumers.Why this approach
Class.forName()in getters<clinit>(JLS §12.4.2 no-op). Also reverts Fabian's intentional removal in PR #28912initializeAllAccessors()from<clinit>CosmosClientBuilder.static{}ImplementationBridgeHelpers<clinit>entered from within other classes'<clinit>via static final accessor fields (demo PR #48697)<clinit>involvement, zero recursive edge casesTests
concurrentAccessorInitializationShouldNotDeadlock(invocationCount=5) — forks fresh child JVMs, 12 threads viaCyclicBarrierconcurrently triggering<clinit>of 6 high-risk classes. 30s timeout catches deadlock.allAccessorClassesMustHaveStaticInitializerBlock— forked JVM iterates all*Helperinner classes, calls each getter, verifies accessor is non-null. Catches missingstatic { initialize(); }blocks.noStaticOrInstanceAccessorFieldsInConsumingClasses— reflection-based enforcement. Collects allAccessorinterface types fromImplementationBridgeHelpers, scans every class forstaticorfinalfields of those types. Catches reintroduction of dangerous patterns regardless of source formatting.accessorInitialization— existing test, validates explicitinitializeAllAccessors()bootstrap path.